https://nova.newcastle.edu.au/vital/access/ /manager/Index en-au 5 Combining semantic and term frequency similarities for text clustering https://nova.newcastle.edu.au/vital/access/ /manager/Repository/uon:42448 n-gram corpora or thesauri, have been proposed in the literature. In this paper, the Frequency Google Tri-gram Measure is proposed to assess similarity between documents based on the frequencies of terms in the compared documents as well as the Google n-gram corpus as an additional semantic similarity source. Clustering algorithms are applied to several real datasets in order to experimentally evaluate the quality of the clusters obtained with the proposed measure and compare it with a number of state-of-the-art measures from the literature. The experimental results demonstrate that the proposed measure improves significantly the quality of document clustering, based on statistical tests. We further demonstrate that clustering results combining bag-of-words and semantic similarity are superior to those obtained with either approach independently.]]> Tue 23 Aug 2022 11:21:30 AEST ]]> Food data integration by using heuristics based on lexical and semantic similarities https://nova.newcastle.edu.au/vital/access/ /manager/Repository/uon:39960 Thu 30 Jun 2022 16:21:32 AEST ]]>